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Abstract 

Background: The validity, reliability and cross-country comparability of summary measures of population health 
(SMPH) have been persistently debated. In this debate, the measurement and valuation of nonfatal health 
outcomes have been defined as key issues. Our goal was to quantify and decompose international differences in 
health expectancy based on health-related quality of life (HRQoL). We focused on the impact of value set choice 
on cross-country variation. 

Methods: We calculated Quality Adjusted Life Expectancy (QALE) at age 20 for 15 countries in which EQ-5D 
population surveys had been conducted. We applied the Sullivan approach to combine the EQ-5D based HRQoL 
data with life tables from the Human Mortality Database. Mean HRQoL by country-gender-age was estimated using 
a parametric model. We used nonparametric bootstrap techniques to compute confidence intervals. QALE was 
then compared across the six country-specific time trade-off value sets that were available. Finally, three 
counterfactual estimates were generated in order to assess the contribution of mortality, health states and health- 
state values to cross-country differences in QALE. 

Results: QALE at age 20 ranged from 33 years in Armenia to almost 61 years in Japan, using the UK value set. The 
value sets of the other five countries generated different estimates, up to seven years higher. The relative impact of 
choosing a different value set differed across country-gender strata between 2% and 20%. In 50% of the country- 
gender strata the ranking changed by two or more positions across value sets. The decomposition demonstrated a 
varying impact of health states, health-state values, and mortality on QALE differences across countries. 

Conclusions: The choice of the value set in SMPH may seriously affect cross-country comparisons of health 
expectancy, even across populations of similar levels of wealth and education. In our opinion, it is essential to get 
more insight into the drivers of differences in health-state values across populations. This will enhance the 
usefulness of health-expectancy measures. 



Background analyses [3-5]. In this study, we focus on using SMPH to 

Summary measures of population health (SMPH) have compare the level of health across populations, 

been calculated to represent the health of a particular Although different types of SMPH have been developed 

population in a single number, combining information on [6-10], they usually comprise three elements: information 

fatal and nonfatal health outcomes [1,2]. SMPH have been on mortality, nonfatal health outcomes, and health-state 

applied to various purposes, e.g., to monitor changes in values. Health-state values reflect the impact of nonfatal 

population health over time, to compare population health health outcomes on a cardinal scale, commonly compris- 

across countries, to investigate health inequalities (the dis- ing a value of 1 for full health and a value of 0 for a state 

tribution of health within a population), and to quantify equivalent to death. In SMPH, the number of years lived 

the benefits of health interventions in cost effectiveness in a particular population (taken from life tables) is com- 
bined with information on the (proportional) prevalence 
of health states or diseases and the value of these nonfatal 
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healthy life years lived. 1 The value sets provide the link 
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between the information on nonfatal health outcomes 
and the information on mortality. 

There has been much debate on SMPH, in particular 
regarding the validity, reliability, and cross-country com- 
parability of different methods. A complete discussion on 
the pros and cons of different methods is beyond the 
scope of this paper and can be found elsewhere [6,11,12]. 
In short, crucial and persistent issues have been the mea- 
surement and valuation of nonfatal health outcomes and 
the incorporation of other values such as discounting or 
equity. In cases where SMPH are used to compare popu- 
lation health across countries, it is essential to use the 
same concepts and measurement methods for mortality, 
nonfatal health outcomes, and value sets across countries. 
Furthermore, it is crucial to understand in what way the 
method chosen may affect cross-country variation in the 
summary measure. 

In this study we performed a cross-country compari- 
son of Quality Adjusted Life Expectancy (QALE). We 
included information on health-related quality of life 
(HRQoL) to represent nonfatal health outcomes. EQ-5D 
(HRQoL) population surveys were used, and we 
included the 15 countries in which an EQ-5D popula- 
tion survey had been conducted. The EQ-5D is a stan- 
dardized and validated questionnaire for measuring 
HRQoL. It comprises five dimensions such as mobility 
and self-care. The information on HRQoL, in combina- 
tion with one of the available value sets, can be used to 
calculate QALE. As far as we know, a HRQoL-based 
approach has rarely been used in SMPH [1], particularly 
in international comparisons. The approach may prove 
interesting, since the value sets are calculated on the 
basis of choice-based methods, which have a theoretical 
foundation in economic theory [13]. Furthermore, data 
requirements of an EQ-5D type of instrument may be 
limited compared to other approaches such as using dis- 
ease prevalence, particularly in international compari- 
sons [14,15]. There are several other validated HRQoL 
instruments besides the EQ-5D, such as the SF-36 and 
the Health Utility Index mark 2 and mark 3 (HUI-2 and 
HUI-3) [16-18]. Muennig et al. used EQ-5D data to esti- 
mate Health Adjusted Life Years (HALY) in the Ameri- 
can population [19]. They found differences across 
income groups, yet they did not provide insight into the 
uncertainty in their estimates. In Canada, the HUI was 
used to calculate a national SMPH [20,21]. Feeny et al. 
used the HUI-3 and a single Canadian value set to com- 
pare health expectancy between Canada and the US 
[21]. Significant health differences between the two 
countries were found. Health-state profiles have also 
been included in SMPH in combination with informa- 
tion on diseases and disability [7]. 

Our first aim was to provide more empirical evidence 
on international differences in HRQoL-based health 



expectancy. Additionally, we aimed to explore the impact 
of the value set choice. In the context of international 
comparisons, a choice has to be made between country- 
specific values and cross-country (global) values. The 
issue of value set choice has not been extensively dis- 
cussed in the literature, however. It can be argued that if 
SMPH serve (international) health system performance 
assessments, country-specific value sets are preferred. 
Health systems should deliver outcomes in accordance 
with the preferences of the population they serve and 
whose means are put in use. Country-specific value sets 
may not always be available, however. Some have used 
foreign value sets, e.g., from neighboring countries. For 
example, Feeny et al. compared health-utility-based 
health expectancy between the US and Canada using the 
Canadian value set for both countries [21]. The authors 
remarked this as a limitation because the true preferences 
of the US population may not exactly resemble the Cana- 
dian values. Some have used a single global value set in 
international comparisons. For example, Mathers et al. 
calculated Health Adjusted Life Expectancy (HALE) by 
combining data on disease incidence (from the WHO 
Global Burden of Disease [GBD] study) with, for a subset 
of countries, survey data on health states [7]. Global 
value sets were applied to both the diseases (values were 
called severity weights in this context) and the health 
states. International comparisons of disability-adjusted 
life years (DALYs) and of disability-adjusted life expec- 
tancy (DALE) also used a single value set across countries 
[22-24]. It has been argued that the valuation of health 
domains shows reasonable consistency across countries, 
justifying the use of a global value set from an empirical 
perspective [25]. Nevertheless the need for more empiri- 
cal evidence was acknowledged. Others did find differ- 
ences in disease/disability-related values across countries 
and raised doubts about the universality of health values 
[26] . Another consideration that could support the use of 
global values is that identical interventions on identical 
patients will result in different benefits if different value 
sets are used. For example, less-healthy (poorer) popula- 
tions may experience a smaller impact of health problems 
and a smaller benefit from interventions because they are 
unaware of better health outcomes. In other words, dif- 
ferences in values and expectations would determine sys- 
tem performance and could also alter resource allocation 
decisions across populations in a way that may be consid- 
ered undesirable. 

In summary, the literature has demonstrated a need to 
improve the understanding of differences in the valuation 
of health, also in the context of international compari- 
sons of SMPH [25-27]. We aimed to provide more 
empirical evidence on the impact of value sets on cross- 
country differences in health expectancy. Furthermore, 
we aimed to discuss these results in the context of the 
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theoretical and methodological issues that have been 
raised in the literature. 

Methods 

Data 

We calculated QALE in 15 countries using individual-level 
EQ-5D survey data (provided by Euroqol Group) and life 
tables from the Human Mortality Database (HMD) [28]. 
The HMD did not provide life tables for Armenia and 
Greece, for which we instead used WHO life tables [29]. 
The countries were selected on the basis of EQ-5D data 
availability. The EQ-5D surveys were conducted between 
1993 and 2002 (see Additional file 1). All surveys used the 
standard EQ-5D setup. The translation process of the 
EQ-5D surveys followed the guidelines proposed in the 
international literature [30]. Survey respondents were non- 
institutionalized persons older than 18 years. Sample size 
varied between 400 and 10,000 observations per country 
(see Additional file 1). We excluded 2,989 observations 
with missing values in at least one of the EQ-5D dimen- 
sions because HRQoL could not be calculated in these 
cases. Consequently 41,562 observations/individuals 
remained in the pooled dataset. We used life tables from 
the year 2000 for all countries. 

The value sets used to weight health states were all 
based on the time trade-off (TTO) elicitation technique 
and were taken from the literature. TTO-based valua- 
tion studies had been conducted in Germany, Japan, the 
Netherlands, Spain, the UK, and the US (see Table 1) 
[16,31-35]. The TTO method is considered the most 
appropriate (consistent) method to elicit preferences, 
compared to the Standard Gamble technique or the 
Visual Analogue Scale, for example [36]. 

HRQoL 

The EQ-5D comprises five domains: mobility, self-care, 
usual activities, pain/discomfort, and anxiety/depression. 
Each domain contains three levels: no problems (1), 
some problems (2), and extreme problems (3). For 
example, a respondent may report no problems in mobi- 
lity, self-care, usual activities, and pain/discomfort, and 
some problems in anxiety/ depression. Generally the five 
answers are transformed into a single HRQoL index as 
follows: 



Table 1 Characteristics of the TTO value sets 


Country 


Reference 


Elicitation year 


Minimum HRQoL 


Germany 


Greiner (2005) 


1 997-1 998 


-0.205 


Japan 


Tsuchiya (2002) 


1998 


-0.1 1 1 


Netherlands 


Lamers (2005) 


2003 


-0329 


Spain 


Badia (2001) 


1996 


-0.654 


UK 


Dolan (1997) 


1993 


-0594 


US 


Shaw (2005) 


2002 


-0.102 



HRQoL = 1 - J^&cjkdjk + &N2 + y c N3) 

jk 



(D 



where a, 



country c; d 



cjk 



ik 



value of EQ-5D domain / and level k for 
= dummy for health state / and level k; fi c 
= value of having some or severe problems in at least 
one health domain (dummy A/2) for country c; and J c = 
value of having severe problems in at least one health 
domain (dummy A/3) for country c. 

The US value set was based on a different formula 
[35]: 



HRQoL = 1 - ^{a cjk d jk ■ 



0 C D1 — yjlsquare + x c /3 + ij/Jlsquare) 



(2) 



where Dl = number of domains with some or extreme 
problems beyond the first, I2square equals the square of 
the number of domains at level 2 beyond the first, and 
I3square equals the square of the number of domains at 
level 3 beyond the first. This model was chosen in the 
US because it provided the best fit for the data [35]. 
Additionally, in contrast to the other value sets, the US 
model was meant to take account of the marginal 
changes in HRQoL associated with having some or 
extreme problems in additional domains. 

Equation (1) and equation (2) show that the maximum 
HRQoL equals 1. The values a c jk reflect the HRQoL 
reduction associated with having some problems or 
severe problems in each EQ-5D domain. These prefer- 
ences may differ across countries as shown in Table 1 
by the difference in minimum HRQoL (see also 
[34,37,38]). Figure 1 demonstrates the relative value of 
each EQ-5D dimension for the five value sets that are 
based on equation (1). For example, it shows that, com- 
pared to Dutch residents, people in the UK attached 




Figure 1 Value of the EQ-5D domains and levels The US values 
are not shown because they are based on a different formula. 
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greater value to having some or severe health problems 
in all domains except anxiety (see [33]). Consequently, 
minimum HRQoL was lower in the UK (-0.594 vs. 
-0.329). 

Analysis 

We used the Sullivan approach to combine mortality 
and nonfatal health outcomes and to calculate QALE 
[39]. The life tables comprised current death rates and 
conditional probabilities of death by country, gender, 
and age group (mostly five-year age groups). These 
probabilities were used to calculate the number of life 
years lived per age group for a hypothetical cohort. We 
multiplied the number of life years, as given in the 
HMD life tables, with the mean HRQoL as predicted by 
the parametric model described underneath, in order to 
calculate the number of healthy life years. Finally, the 
total number of healthy life years from age x was 
divided by the number of survivors in the hypothetical 
cohort at age x to calculate QALE at age x. We 
excluded age groups under 20 years, because the EQ-5D 
surveys were conducted among individuals older than 
18 years. In addition, we were unable to differentiate 
HRQoL in the age groups over 85 years, because the 
maximum age of respondents was 90 in almost all sur- 
veys. Equation (3) is a formal representation of the 
QALE. 

QALE c ,g, a - ^ LY ^; HR ^ (3) 

k,g,a 

LY Cr&a equals total number of life years lived in coun- 
try c, gender g, and age group a; 

HRQoL cga equals average (predicted) HRQoL by 
country c, gender g, and age group a; 

lcg,a equals number of survivors in the life table cohort 
for country c, gender g, and age group a; and z equals 
the last open-ended age interval of the life table. 

HRQoL cga was calculated in three steps: 1) we calcu- 
lated HRQoL at the individual level using equation (1); 2) 
we estimated the predicted HRQoL at the individual level 
using a multiple regression model; and 3) we computed 
the mean predicted HRQoL by country, gender, and age. 
In step 2, we estimated a multiple regression model with 
HRQoL as dependent variable (in the range [minimum, 
1]) and age, gender, country dummies, and education 
level as independent variables. We estimated the model 
to fully exploit the information available in the pooled 
dataset and to explore the relationship between HRQoL 
and respondent characteristics (Additional file 2 shows 
that there is almost no difference between QALE using 
observed HRQoL and QALE using predicted HRQoL). 
Previous studies have shown that HRQoL is associated 
with demographic and socioeconomic characteristics 



such as age, gender, education, income, and race (e.g., 
[19,40-42]). The EQ-5D surveys provided information on 
the respondents' age (the average age was 47 in the 
pooled dataset), gender (46% male), country, and level of 
education (primary education 31%, secondary education 
57%, and university level 12%). The variables socioeco- 
nomic status and smoking status were not used because 
of high nonresponse rates (43% and 47% respectively). It 
was expected that the relationship between HRQoL and, 
for example, age differed by gender and country. There- 
fore interaction terms between country, gender, and age 
were included in the model. We used nonparametric 
bootstrap techniques to calculate 95% confidence inter- 
vals. As discussed in Pullenayegum et al., regression 
models that use this type of outcome measure need to 
take heteroscedasticity and a nonnormal distribution into 
account [43]. Pullenayegum et al. showed that OLS 
regression with nonparametric bootstrap can give 'accep- 
table adequacy' of the confidence intervals with these 
data. We also tested alternative models, a tobit model 
and a two-part model, which have been used to model 
skewed and truncated data. The outcomes of these mod- 
els did not alter the main results and conclusions (these 
regression results can be obtained through the corre- 
sponding author). 

Finally, we computed counterfactual estimates in order 
to explore the contribution of mortality, health states, 
and health-state valuation to cross-country variation in 
QALE. In this part of the study, we only included the 
six countries for which value sets had been established 
(Table 1). As a result, six sets of counterfactual esti- 
mates were generated. In each set, a different country 
was used as reference country. Suppose we use Germany 
as reference country. Then, we imputed mortality rates, 
health-state profiles, and values from Germany into 
QALE of, for example, Spain. Subsequently, we investi- 
gated the associated change in QALE for Spain in com- 
parison to QALE based on Spanish mortality, health 
states, and values. 

In the first counterfactual estimate we used country-spe- 
cific value sets, country-specific EQ-5D health states, and 
death rates of the reference country. In other words, we 
imputed Li^and / of the reference country in equation (3). 
The difference between this counterfactual QALE and the 
original QALE (based on country-specific mortality, health 
states, and values) revealed the contribution of mortality. 
With the second counterfactual QALE we estimated the 
impact of health states using country-specific value sets, 
country-specific death rates, and EQ-5D health states of 
the reference country. Now the HRQoL component in 
equation (3) was based on country-specific values (X^j^ 
and on the health state profiles flU of the reference coun- 
try. The difference between this counterfactual QALE and 
the original QALE showed the contribution of health 
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states. The third counterfactual estimate comprised coun- 
try-specific EQ-5D health states, country-specific death 
rates, and the value set of the reference country. We 
imputed the values a of the reference country in equation 
(1). Subsequently, QALE was estimated using equation (3) 
and the difference between this counterfactual QALE and 
the original QALE demonstrated the impact of value sets. 

Results 

Regression results 

Table 2 presents the results of the regression model 
(using UK values). The table shows that HRQoL declined 
with age, although the relationship was not linear (age, 
age squared, and age cubic were jointly significant). The 
gender-age interaction term shows that the age effect dif- 
fered between men and women: the reduction in HRQoL 
over age was somewhat smaller for males. In addition, 
the regression results showed significant country effects 
and cross-country differences in the impact of age and 
gender. The country dummies and interaction terms 
were jointly significant. HRQoL was also positively asso- 
ciated with education level. 

QALE 

Figure 2 shows QALE at age 20 by country and gender 
(using UK values). It shows that QALE at age 20 ranged 
from 33 years in Armenia (males) to almost 61 years in 
Japan (females). The figure shows that QALE at age 20 
years was higher for females than for males. Only 
Greece showed a higher male QALE, yet the confidence 
intervals of the two genders largely overlapped for this 
country. The absolute gender difference in QALE ran- 
ged between 1.6 years in the US and 4.6 years in 
Slovenia. 

Value set choice 

The former results were calculated using the UK value 
set in all countries. Table 3 demonstrates QALE using 
different value sets. The table shows that the UK value 
set generated the lowest QALE in most (67%) of the 
country-gender strata. The German value set generated 
the highest QALE in all country-gender strata, with a 
maximum difference of 7.2 healthy years (difference in 
QALE between the German value set and the UK value 
set for females in Armenia). The US value set consis- 
tently showed the second-highest QALE. In 60% to 70% 
of all country-gender strata, the Spanish value set ranked 
third, the Dutch value set ranked fourth, the Japanese 
value set ranked fifth, and the UK value set ranked sixth. 
The relative change in QALE, as a result of a change in 
value set choice, varied between countries. For example, 
the difference in QALE between the German value set 
and the UK value set was close to 3% for Japanese males, 
but more than 20% for Armenian females. We also added 



Table 2 Regression results 1 



Main effects 


Coef. 


p> M 


Interaction terms 


Coef. 


P > |z| 


Age 


-0.069 


0.000 


Gender*age 


-0.003 


0.000 


Age squared 


0.003 


0.004 








Age cubic 


-0.000 


0.002 


Belgium*age 


0.028 


0.000 








Canada*age 


0.027 


0.000 


Education 2 


0.040 


0.000 


Finland*age 


0.024 


0.000 


Gender 3 


0.010 


0.555 


Germany*age 


0.026 


0.000 








Greece*age 


0.020 


0.000 


Be gium 


-0.1 14 


0.003 


Hungary*age 


0.018 


0.000 


Canada 


-0.107 


0.000 


Japan*age 


0.032 


0.000 


Finland 


-0.078 


0.010 


Netherlands*age 


0.031 


0.000 


Germany 


-0.086 


0.009 


New Zealand*age 


0.027 


0.000 


Greece 


0.018 


0.700 


Slovenia*age 


0.020 


0.000 


Hungary 


-0.025 


0.372 


Spain*age 


0.029 


0.000 


Japan 


-0.085 


0.042 


Sweden*age 


0.033 


0.000 


Netherlands 


-0.125 


0.000 


UK*age 


0.026 


0.000 


New Zealand 


-0.104 


0.003 


US*age 


0.025 


0.000 


Slovenia 


-0.114 


0.003 








Spain 


u.uyu 


U.UU I 


Belgium*gender 


n nm 

U.UU I 


u.yoo 


Sweden 


-0.189 


0.000 


Canada*gender 


-0.015 


0.490 


UK 


-0.094 


0.001 


Finland*gender 


0.008 


0.689 


US 


-0.132 


0.000 


Germany*gender 


-0.008 


0.724 








Greece*gender 


-0.017 


0.496 








Hungary*gender 


-0.024 


0.160 








Japan*gender 


-0.009 


0.701 








Netherlands*gender 


-0.015 


0.397 








New Zealand*gender 


0.015 


0.502 








Slovenia*gender 


0.019 


0.367 








j|jd ill y ci luei 


-0 024 


0 1 58 








Sweden*gender 


0.036 


0.037 








UK*gender 


0.023 


0.215 








US*gender 


-0.014 


0.447 


Constant 


1,138 










Adj R-squared 


0.16 










N 


40,650 











1 Standard errors were calculated using non-parametric bootstrap technique 

2 Education levels: 1 = low (primary); 2 = medium (secondary); 3 = high 
(university) 

3 Gender: 0 = male; 1 = female 



a country ranking (R) by value set and by gender. The 
countries at the top end and low end of the ranking 
showed a stable position across value sets. In between, 
the ranking of the countries was affected to some extent. 
Around 50% of the country-gender strata moved two or 
more rank-positions across value sets. Notable rank- 
changes were found for Belgium (females), Canada 
(females), Finland (females), Greece (males), and Sweden 
(males). 
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Figure 2 Quality Adjusted Life Expectancy at 20 years by country 
and gender'. 'Confidence interval based on nonparametric 
bootstrap technique. Blue: females, Red: males. 



QALE decomposition 

Counterfactual estimates were generated in order to 
explore the role of mortality, health states, and health- 
state values in cross-country differences. Figure 3 
demonstrates the results. Each of the six countries 
involved (Germany, Japan, Netherlands, Spain, UK, and 
US) appears once as reference country in the counter- 
factual scenarios. As a result, six figures are shown. The 
figure demonstrates that the impact of the different 
QALE components varied substantially across countries. 
For example, the top-left graph demonstrates the contri- 
bution of mortality, EQ-5D health states, and health- 
state values to the difference in QALE with the UK. It 
shows that mortality rates explained the major part of 
the QALE difference with the UK for Japanese females 
and Spanish females. Differences in terms of valuation 
explained most of the difference in QALE with the UK 
for Germany and the US. Differences in EQ-5D health 
states explained the greater part of the variation in 
QALE for males in Japan, the Netherlands, and Spain. 
The figure shows that the differences in QALE with 
Germany are largely explained by the valuation compo- 
nent for all countries. 

Discussion and conclusions 

In this study we performed an international comparison 
of HRQoL-based health expectancy. We found that 
QALE at age 20 ranged between 33 years in Armenia 
and almost 61 years in Japan. Generally, female QALE 
was higher than male QALE within this set of countries. 
In terms of QALE, Hungary and Slovenia performed 
better than Armenia, yet worse in comparison to the 
other countries. The relatively low health expectancy for 
a country such as Armenia may be expected given its 



lower levels of health spending and national income and 
its different socioeconomic circumstances. The United 
States performed worse in terms of QALE compared to 
the other western high-income countries in the dataset. 
Many studies have found such unfavorable health out- 
comes in the US and several explanations for this phe- 
nomenon have been given, such as an inefficient health 
care system, substantial disparities in the population in 
terms of access to health care, or behavioral factors 
(unhealthy diets) [44,45]. 

In the final part of the analysis, we decomposed the dif- 
ference in QALE using counterfactual scenarios. It was 
shown that the relative contribution of mortality, health 
states, and health-state values differed among countries. 
For example, the high QALE for Japanese males was to a 
large extent a result of a low prevalence of health pro- 
blems in EQ-5D domains. In turn, the better average 
health of Spanish females was largely explained by lower 
mortality rates. Interestingly, in various cases the EQ-5D 
profiles showed a greater contribution to differences in 
QALE than differences in mortality. Lower mortality did 
go hand in hand with better HRQoL, although there 
were exceptions. For example, Dutch females had a lower 
life expectancy than Spanish females, yet they experi- 
enced fewer health problems in EQ-5D domains. As a 
result, the difference in HRQoL-based health expectancy 
was smaller than the difference in life expectancy 
between these two countries. The decomposition con- 
firmed that international comparisons of health expec- 
tancy, based on country-specific values, are influenced 
substantially by differences in value sets. 

Differences in health expectancy across countries may 
stem from various factors, among which methodological 
issues and cultural differences play a role. Amid the three 
main SMPH elements (mortality, nonfatal health out- 
comes, and valuation) we focus on the value sets first. A 
remarkable result was the difference in QALE across the 
six TTO value sets. The German value set generated 
QALE up to seven years higher than the UK value set. 
The ranking of countries varied to a lesser extent across 
value sets, particularly in the high-performing or low- 
performing countries. We did find rank switches in the 
group of average performers. This may be expected 
because the differences in QALE were relatively small in 
this middle group, showing various overlapping confi- 
dence intervals (see Figure 2). Therefore, the ranking of 
these country-gender strata is particularly sensitive to the 
value-set choice. Around 50% of the country-gender 
strata showed a rank-change of two or more positions 
across value sets. Interestingly, the relative change in 
QALE associated with the value set choice differed across 
countries. The impact was greatest in low-performing 
countries such as Armenia, Hungary, and Slovenia. 
We also found that the ranking of countries did not 
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Table 3 QALE at age 20 years using different value sets plus a country ranking (R) 1 

Value set Germany Value set Japan Value set Netherlands Value set Spain Value set UK Value set US 
QALE R QALE R QALE R QALE R QALE R QALE R 



Males 












ARM 


39.13 


15 


36.93 


15 


34.91 


BEL 


50.88 


9 


47.22 


10 


48.45 


CAN 


52.72 


2 


49.00 


5 


49.89 


FIN 


49.71 


1 1 


46.35 


12 


48.00 


GER 


50.68 


10 


48.21 


9 


49.24 


GRE 


51.20 


7 


50.17 


-1 


49.95 


HUN 


44.34 


M 


41.83 


13 


42.07 


JAP 


56.14 


1 


54.68 


1 


55.19 


NET 


52.60 


5 


50.25 


3 


51.33 


NZL 


52.27 


6 


48.82 


6 


50.13 


SLV 


46.04 


13 


41.36 


14 


42.74 


SPA 


52.66 


3 


50.43 


2 


51.17 


SWE 


52.63 


'1 


48.37 


8 


49.1 1 


UK 


50.93 


8 


48.60 


7 


48.95 


US 


49.67 


12 


46.61 


1 1 


47.33 


Females 












ARM 


42.74 


15 


39.43 


15 


37.03 


BEL 


55.14 


7 


50.77 


10 


52.24 


CAN 


55.50 


5 


50.83 


9 


52.05 


FIN 


54.70 


10 


50.95 


8 


52.69 


GER 


55.12 


8 


51.22 


7 


52.12 


GRE 


51.41 


13 


49.98 


1 1 


50.23 


HUN 


49.69 


1 '1 


46.01 


14 


45.65 


JAP 


61.01 


1 


58.68 


1 


59.53 


NET 


55.35 


6 


52.10 


4 


53.44 


NZL 


56.45 


4 


51.99 


5 


53.55 


SLV 


51.88 


12 


46.03 


13 


47.64 


SPA 


56.67 


3 


53.80 


2 


53.93 


SWE 


56.75 


2 


52.97 


3 


53.70 


UK 


54.98 


9 


51.75 


6 


52.27 


US 


52.45 


1 1 


48.92 


12 


49.18 



1 QALE in bold where country-specific values were used. 



consistently improve when local values were used. For 
example, Germany did not reach a higher rank in the 
German value set compared to the ranking in which 
Japanese values were used. 

In the literature, the variation in health valuation has 
largely been explained by methodological differences 
across valuation studies and differences in the level of 
wealth and the level of education among populations 
[27]. In our case the available value sets represented the 
preferences of Western countries of similar levels of edu- 
cation and similar levels of wealth. Although we cannot 
exclude that methodological differences played a role, we 
argue that these cannot fully explain the variation that 
was found (see also [46]). All studies were conducted 
using face-to-face interviews, applied the TTO technique 
to elicit values, and included nationally-representative 
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samples. In order to determine the valuation function, 
they used similarly specified least squares regression 
models representing the relationship between the TTO 
outcome and EQ-5D domains-levels and took account of 
within-individual error correlation [46]. The main differ- 
ence was the model used in the US, which included a dif- 
ferent specification of the N2 and N3 interaction terms 
and the marginal HRQoL effects. The US value set took 
account of a decrease in the marginal reduction in 
HRQoL associated with further increases in the number 
of domains with any problems or extreme problems. Still, 
the extent to which the US valuation function generated 
different HRQoL scores not only depended on the inter- 
action terms and marginal effects, but also on the values 
attached to the individual domains and levels. Additional 
file 3 shows for each value set the HRQoL score 
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Figure 3 Contribution of mortality, EQ-SD health states and value sets to cross-country differences in QALE 7 'The y-axis shows the 
difference in quality adjusted life years between the QALE that comprised country-specific components and each counterfactual estimate. Blue: 
mortality, Red: health states, Green: values. 



associated with certain health states to exemplify the 
differences. 

Consequently, we argue that a more conceptual discus- 
sion is needed. Cross-country variation in values may 
reflect cultural differences or differences in the availability 
of certain social services (and therefore the perceived/ 
expected impact of health impairments). Naturally, health- 
state values also differ among individuals [47]. It may be 
argued that national or global value sets should cover this 
within-population variation in terms of values. In other 
words, the samples in elicitation studies need to be repre- 
sentative along the relevant population characteristics 
(similar to the other elements of SMPH). The cross- 
national differences in values need to be taken into 
account in the context of health-system-performance 
assessments and international comparisons of population 
health. In such studies, country-specific value sets may be 
preferred, since each health system should deliver out- 
comes according to the preferences of the population it 
serves and whose means are put in use. Moreover, the 
varying impact of health problems across countries needs 
to be accounted for. Some previous international compari- 
sons of SMPH have used global value sets, based on the 
argument that health values are reasonably consistent 



across countries. However, the result of this study, similar 
to, for example, Ustiin et al. [26], points to the contrary 
and shows that variation in values may affect SMPH out- 
comes. A drawback of using country-specific value sets is 
that they may not always be available, as was experienced 
in this study and in previous studies (e.g. [21]). In our opi- 
nion, the best solution is to calculate health expectancy by 
different foreign value sets and to compare the differences 
(as in Table 3). Additionally, the use of country-specific 
value sets in international comparisons may deserve close 
scrutiny from an equity perspective, particularly if there is 
a relationship among values, true health status, and level 
of wealth. Populations with less exposure to what constitu- 
tes "full health" may assign lower values, i.e., a smaller loss 
in terms of HRQoL, to certain health problems. As a 
result, a particular health intervention will generate fewer 
benefits in these populations. From an equity perspective, 
this may be considered undesirable. This argument has 
not been tested empirically though, and may be less rele- 
vant when only high-income countries of similar levels of 
health are included, as in our study. 

The issue of value-set choice not only pertains to 
HRQoL-based health expectancy. All SMPH using mul- 
tiple health states, diseases, levels of disability, or other 
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morbidity measures use a valuation function or a set of 
weights. Only measures such as disability-free life expec- 
tancy do not comprise value sets. Such approaches clas- 
sify people in two groups: with or without disability or 
disease. In that case you simply multiply the proportion 
without any disability with the number of life years lived 
in a particular stratum. Obviously these are rather crude 
methods that neglect differences in severity levels. 

Two other issues need to be raised regarding the valua- 
tion part of SMPH. First, a plus of the EQ-5D type 
instrument, particularly in case an economic perspective 
is required, may be that value sets have been elicited 
using a choice-based method (TTO technique). Choice- 
based methods are considered the preferred method 
among economists to elicit people's preferences. The 
extent to which the elicitation method affects cross- 
country differences is largely unknown. Some have 
argued that different elicitation methods generate a 
rather similar cross-country variation in terms of values, 
but more research is needed on this issue [47] . Secondly, 
we need to address the question of whose values should 
be used. The value sets we used all represented general 
population values. Various authors have compared popu- 
lation values with patient values [48-51]. From an eco- 
nomic perspective, population values may be preferred, 
since health systems consume public means and should 
therefore allocate their resources and outcomes accord- 
ing to the preferences of the general population [48]. 
However, it was found that the general public attaches a 
much greater loss in terms of HRQoL to particular health 
problems than patients do. Although patients are better 
informed about the impact of morbidity, the adaptation 
effect is present among them [52,53]. Expert opinion has 
also been applied in previous international studies on 
SMPH [24]. The question is to what extent experts are 
able to assess the impact of different health states or dis- 
eases on people in general as well as for different popula- 
tions. As a result this discussion appears unresolved. 

As demonstrated by the decomposition, differences in 
QALE are also affected by differences in health states. 
Two major measurement issues should be discussed in 
this respect. First, although all studies used the same stan- 
dardized EQ-5D instrument, the mode of administration 
differed across studies. It has been shown that telephone 
surveys in particular may generate more positive HRQoL 
scores compared to self- or interviewer-administered sur- 
veys [54]. The surveys included in our study were con- 
ducted as face-to-face interviews (Armenia, Greece, Japan, 
Spain, and UK) or self-administered postal interviews 
(other countries). Only part of the German data was based 
on a telephone survey. A second major measurement issue 
regarding the measurement of nonfatal health outcomes is 
response heterogeneity. People who are in an objectively 
equal health state may respond differently to the same 



health question. Response heterogeneity can be explained 
by differences in norms and expectations, in awareness, 
and in access to health care across populations. It may 
affect the validity and the cross-population comparability 
of all SMPH using self-reported health data (in terms of 
health states, disability, or disease) [55]. At the same time, 
the effect of response heterogeneity may somewhat be 
dampened if similar mechanisms also play a role in the 
valuation of nonfatal health outcomes. Some have argued 
that response heterogeneity may be less of a problem if 
different severity levels are included in the morbidity mea- 
sure, since most threshold issues arise at the lower-valued 
mild-severity levels [1]. Moreover, the problem may be 
greater in self-rated general health questions, and some 
authors even used EQ-5D type of questions as more objec- 
tive health measures [56,57]. Still, it remains unclear to 
what extent the reporting of EQ-5D health states, and our 
international comparison, have been subject to response 
bias. Whether response bias in the measurement of 
morbidity is related to the variation in the valuation of 
morbidity needs further investigation. 

From a practical point of view, HRQoL-type of data may 
be preferred, since this approach may turn out to be less 
resource-intensive in terms of data gathering and data 
analysis than, for example, disease-based methods [22]. 
The latter approach requires information on many types 
of diseases and on the impact of all diseases in terms of 
disability. At an international level, data availability may be 
limited, which could cause less accuracy of the results. 
Furthermore, the presence of comorbidity complicates 
disease-based calculations [58]. In turn, an advantage of 
disease-based measures may be that clinical records or 
administrative records on the prevalence of diseases 
can be used. Such data do not suffer from self-report 
problems. 

The following should be kept in mind while interpret- 
ing our results. First, the EQ-5D surveys were conducted 
in different years. This also holds for the value sets that 
were used, whereas preferences may change over time. It 
is unclear whether this is the case and to what extent this 
may have affected the results. We did see that value sets 
from similar years still showed substantial differences 
such as those from the Netherlands and the US or those 
from Germany and Japan. Future research could clarify 
to what extent health-related preferences change over 
time. Secondly, certain population groups were not 
included in the EQ-5D samples, such as inhabitants 
younger than 20 years and, in most surveys, people older 
than 85. Therefore we did not calculate QALE at birth 
and were unable to differentiate HRQoL within the 85- 
plus group. In addition, the surveys did not include the 
institutionalized population. However, due to a lack of 
comparable data, it is unclear to what extent this influ- 
enced the cross-country variation. Further, it was unclear 
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whether all potential determinants of HRQoL were repre- 
sented sufficiently. Thirdly, we did not take uncertainty 
in mortality into account because this information was 
not included in WHO life tables. However, there will be 
little uncertainty in life tables given the large population 
size. Consequently, the uncertainty in health expectancy 
particularly arises in the morbidity part of these measures 
[21]. Finally, as discussed before, different researchers 
may have used slightly different protocols and analyses 
which may have affected the differences in value sets [46] . 

In conclusion, we recommend that future interna- 
tional comparisons on SMPH profoundly discuss their 
value-set choice, including the theoretical and practical 
issues, and perform sensitivity analyses where possible 
and necessary. In addition, more qualitative research on 
the determinants of differences in valuation within and 
across populations is needed. This will improve the 
interpretation and the usefulness of HRQoL-based, and 
other, summary measures of population health. 

Endnote 

l A simplified example: suppose that the life expectancy 
at birth of a population is equal to 80 years. Further- 
more assume that half of the population lives in perfect 
health for 80 years, and the other half lives in an imper- 
fect health state for 80 years. If the value of this imper- 
fect health state is 0.5 then half of the population will 
live 80 healthy years and half of the population will live 
80*0.5 = 40 healthy years. Consequently health expec- 
tancy of the entire population will be 60 years. 
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